The qgam R package offers methods for fitting additive quantile regression models based on splines, using the methods described in Fasiolo et al., 2017. It is useful to use qgam together with mgcViz, which extends the basic visualizations provided by mgcv.

The main fitting functions are:

Basic example: income vs age

We load the data and fit a median model:

library(mgcViz)
library(SemiPar)
data(age.income)
age.income$income <- exp(age.income$log.income)

fitQ <- qgamV(income ~ age, data = age.income, qu = 0.5)   
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5........done

We predict and plot

ft <-  predict(fitQ, se = TRUE)

ord <- order(age.income$age)
plot(age.income$age, age.income$income, xlab = "Age", 
     ylab = "Income (CAD)", col = "grey")
lines(age.income$age[ord], ft$fit[ord] , col = 1) 
lines(age.income$age[ord], (ft$fit + 2 * ft$se.fit)[ord], col = 1, lty = 2)
lines(age.income$age[ord], (ft$fit - 2 * ft$se.fit)[ord], col = 1, lty = 2)

We can look at p-values etc:

summary(fitQ)
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## income ~ age
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   440368     100598   4.378  1.2e-05 ***
## age            10157       2697   3.766 0.000166 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## R-sq.(adj) =  0.0791   Deviance explained = 5.53%
## -REML = 2926.1  Scale est. = 1         n = 205

The relation between median income and age is clearly non-linear:

fitQ <- qgamV(income ~ s(age), data = age.income, qu = 0.5)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5.........done
ft <-  predict(fitQ, se = TRUE)

plot(age.income$age, age.income$income, xlab = "Age",
     ylab = "Income (CAD)", col = "grey")
lines(age.income$age[ord], ft$fit[ord], col = 2)
lines(age.income$age[ord], (ft$fit + 2 * ft$se.fit)[ord], col = 2, lty = 2)
lines(age.income$age[ord], (ft$fit - 2 * ft$se.fit)[ord], col = 2, lty = 2)

Now we fit multiple quantile at once, then we predict and plot the fit:

fitQ <- mqgamV(income ~ s(age), data = age.income, qu = c(0.1, 0.25, 0.5, 0.75, 0.9))
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5.........done 
## qu = 0.25........done 
## qu = 0.75...........done 
## qu = 0.1.........done 
## qu = 0.9.....................done
plot(age.income$age, age.income$income, xlab = "Age", ylab = "Income (CAD)", col = "grey")

for(ii in 1:5){
  ft <-  predict(fitQ[[ii]], se = TRUE)
  lines(age.income$age[ord], ft$fit[ord], col = 1)
}

Notice that the fitting quantiles will cross somewhere:

newd <- data.frame(age = seq(22, 80, length.out = 1e3))
plot(age.income$age, age.income$income, xlab = "Age",
     ylab = "Income (CAD)", col = "grey", xlim = range(newd$age))
for(ii in 1:5){
  ft <-  predict(fitQ[[ii]], newdata = newd, se = TRUE)
  lines(newd$age, ft$fit, col = 1)
  rug(age.income$age)
}

Typically they cross far from the data (when you extrapolate), but they can also cross when you intrapolate.

The output of mqgamV is a list of QGAM models, you can handle each of them as usual:

summary(fitQ[[1]])
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## income ~ s(age)
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   430445      28337   15.19   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##          edf Ref.df Chi.sq  p-value    
## s(age) 4.964  5.997  68.32 9.09e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.116   Deviance explained = 71.9%
## -REML = 2943.4  Scale est. = 1         n = 205

An additive example with four covariates

We simulate some data from the model: \[ y = f_0(x_0)+f_1(x_1)+f_2(x_2)+f_3(x_3)+e,\;\;\; e \sim N(0, 2) \] by doing

set.seed(2)
dat <- gamSim(1, n=1000, dist="normal", scale=2)[c("y", "x0", "x1", "x2", "x3")]
## Gu & Wahba 4 term additive model

We start by fitting a linear quantile model for the median:

fit1 <- qgamV(y ~ x0 + x1 + x2 + x3, data=dat, qu = 0.5)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5.........done
print(plot(fit1, allTerms = TRUE), pages = 1)

We use pages = 1 to plot on a single page, and allTerms to plot also the parametric effects (the plotting method used here plots only smooth or random effects by default).

Should we use a smooth effect of x0? If the effect of x0 was non-linear, we would expect that the number of observations falling below the fit would depart from 0.5, as we move along x0. A rudimental diagnostic plot is:

plot(dat$x0, sign(residuals(fit1)) + rnorm(nrow(dat), 0, 0.05), col = alpha(1, 0.4), pch = 16,
     ylab = "Residuals sign", xlab = "x0")

But residual pattern is more visible in the following plot:

check1D(fit1, "x0") + l_gridQCheck1D(qu = 0.5)

There is definitely a pattern here. An analogous plot along x2 also shows a residual pattern, hence we consider the model:

fit2 <- qgamV(y ~ s(x0) + x1 + s(x2) + x3, data=dat, qu = 0.5)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5.........done
check1D(fit2, "x0") + l_gridQCheck1D(qu = 0.5)

Looks much better, and leads to much lower AIC:

AIC(fit1) - AIC(fit2)
## [1] 693.1062

We can plot all the smooth effects by doing:

print(plot(fit2), pages = 1)

To print only the second we do

print(plot(fit2, select = 2), pages = 1)

print(plot(fit2, allTerms = TRUE), pages = 1)

Now we fit this model to multiple quantiles and plot the fitted effects:

fit <- mqgamV(y ~ s(x0) + x1 + s(x2) + x3, data=dat,
              qu = seq(0.1, 0.9, length.out = 5))
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5.........done 
## qu = 0.3........done 
## qu = 0.7........done 
## qu = 0.1........done 
## qu = 0.9.........done
print(plot(fit, allTerms = TRUE), pages = 1)

We can manipulate the five fitted quantile GAMs individually, for example

print(plot(fit[[1]], allTerms = TRUE), pages = 1)

Random effect modelling: lexical decision task

In this experiment participants are presented with a sequence of stimuli and they have to decide, as quickly as possible, whether each stimulus is an existing words (eg. house) or a non-existing word (eg. pensour) by pressing one of two buttons. The variables we consider here are:

We might be interested in modeling the relation between the response time and length, frequency, native language and trial, and we might want to control for word and subject. We start by loading the data and fitting a simple QGAM model for the median:

library(languageR)
data(lexdec)

lexdec$RT0 <- exp( lexdec$RT ) # Should not need to use log-responses to normalize

fit <- qgamV(RT0 ~ s(Trial) + NativeLanguage + s(Length, k = 5) + s(Frequency),
            data = lexdec, qu = 0.5)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5...........done
print(plot(fit), pages = 1)

It is natural to ask ourselves whether we should control for Subject and for Word:

# check1D(fit, "Subject") would not work because it was not included in the original fit
check1D(fit, lexdec$Subject) + l_gridQCheck1D(qu = 0.5)

check1D(fit, lexdec$Word) + l_gridQCheck1D(qu = 0.5)

It seems so, hence we refit using two random effects:

fit2 <- qgamV(RT0 ~ s(Trial) + NativeLanguage + s(Length, k = 5) +
                    s(Frequency) + s(Subject, bs = "re") + s(Word, bs = "re"),
            data = lexdec, qu = 0.5)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5..............done
check1D(fit2, "Subject") + l_gridQCheck1D(qu = 0.5)

check1D(fit2, "Word") + l_gridQCheck1D(qu = 0.5)

AIC(fit) - AIC(fit2)
## [1] 831.7504

We achieve lower AIC and the residuals checks look better (especially the one for Subject).

A potentially interesting question is then: is the effect of Trial different for native and non-native speakers? We can verify this by using a by-factor smooth:

fit3 <- qgamV(RT0 ~  s(Trial, by = NativeLanguage) + # <- by-factor smooth
                    NativeLanguage + s(Length, k = 5) + s(Frequency) +
                    s(Subject, bs = "re") + s(Word, bs = "re"),
             data = lexdec, qu = 0.5)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5..........done
print(plot(fit3), pages = 1)

Can look directly at the difference between the by-factor smooths by doing:

s1 <- sm(fit3, 1)
s2 <- sm(fit3, 2)
plotDiff(s1, s2) + l_ciPoly() + l_fitLine()

There might be something there, but the difference is not very strong.

Now that we have converged on a (hopefully reasonable) model, we can fit it to several quantiles:

fit5 <- mqgamV(RT0 ~ s(Trial, by = NativeLanguage) + NativeLanguage + s(Length, k = 5) +
                + s(Frequency) + s(Subject, bs = "re") + s(Word, bs = "re"),
                data = lexdec, qu = seq(0.2, 0.8, length.out = 5))
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5..........done 
## qu = 0.35.................done 
## qu = 0.65...........done 
## qu = 0.2...................done 
## qu = 0.8...........done
print(plot(fit5, allTerms = TRUE), pages = 2, ask = FALSE)

The effects are fairly stable across quantiles, the effect of Frequency might be stronger for high quantiles. We can examine the fitted quantile models individually:

summary(fit5[[5]])
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## RT0 ~ s(Trial, by = NativeLanguage) + NativeLanguage + s(Length, 
##     k = 5) + +s(Frequency) + s(Subject, bs = "re") + s(Word, 
##     bs = "re")
## 
## Parametric coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           628.06      31.71   19.80   <2e-16 ***
## NativeLanguageOther   121.46      48.01    2.53   0.0114 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                   edf Ref.df  Chi.sq  p-value    
## s(Trial):NativeLanguageEnglish  2.208  2.733   8.547   0.0247 *  
## s(Trial):NativeLanguageOther    1.195  1.355   0.487   0.5102    
## s(Length)                       1.003  1.004   0.760   0.3842    
## s(Frequency)                    2.789  3.062  36.604 7.26e-08 ***
## s(Subject)                     18.483 19.000 886.997  < 2e-16 ***
## s(Word)                        43.904 76.000 207.497 3.22e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.438   Deviance explained = 46.8%
## -REML =  10834  Scale est. = 1         n = 1659

Setting the loss-smoothing parameter and checking convergence

Let’s simulate some data:

set.seed(5235)
n <- 1000
x <- seq(-3, 3, length.out = n)
X <- cbind(1, x, x^2)
beta <- c(0, 1, 1)
f <- drop(X %*% beta)
dat <- f + rgamma(n, 4, 1)
dataf <- data.frame(cbind(dat, x))
names(dataf) <- c("y", "x")

Assume that we want to estimate quantiles 0.05, 0.5 and 0.95:

fit <- mqgamV(y ~ s(x), data = dataf, qu = c(0.05, 0.5, 0.95), err = 0.05)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5........done 
## qu = 0.95.......done 
## qu = 0.05.................done
plot(x, dat, col = "grey", ylab = "y")
lines(x, f + qgamma(0.95, 4, 1), lty = 2)
lines(x, f + qgamma(0.5, 4, 1), lty = 2)
lines(x, f + qgamma(0.05, 4, 1), lty = 2)
lines(x, predict(fit[[1]]), col = 2)
lines(x, predict(fit[[2]]), col = 2)
lines(x, predict(fit[[3]]), col = 2)

Let’s try to use several values of err:

lfit <- lapply(c(0.01, 0.05, 0.1, 0.2, 0.3, 0.5),
               function(.inp){
                 mqgamV(y ~ s(x), data = dataf, qu = c(0.05, 0.5, 0.95), err = .inp, 
                        aQgam = list(control = list("progress" = F)))
               })

plot(x, dat, col = "grey", ylab = "y", ylim = c(-2, 20))
colss <- rainbow(length(lfit))
for(ii in 1:length(lfit)){
  lines(x, predict(lfit[[ii]][[1]]), col = colss[ii])
  lines(x, predict(lfit[[ii]][[2]]), col = colss[ii])
  lines(x, predict(lfit[[ii]][[3]]), col = colss[ii])
}
lines(x, f + qgamma(0.95, 4, 1), lty = 2)
lines(x, f + qgamma(0.5, 4, 1), lty = 2)
lines(x, f + qgamma(0.05, 4, 1), lty = 2)

The bias increases with err, and it is upward (downward) for high (low) quantiles. The median fit is not much affected by err. The bias really starts appearing for err > 0.1. Decreasing err tends to slow down computation:

system.time( fit1 <- qgamV(y ~ s(x), data = dataf, qu = 0.95, err = 0.05, 
                           aQgam = list(control = list("progress" = F))) )[[3]]
## [1] 0.264
system.time( fit2 <- qgamV(y ~ s(x), data = dataf, qu = 0.95, err = 0.001, 
                           aQgam = list(control = list("progress" = F))) )[[3]]
## [1] 25.444

Even worse, it can lead to numeric problems. Here we check that we have found the minimum of the calibration loss:

check(fit1$calibr, sel = 2)

check(fit2$calibr, sel = 2)

In the first case the loss looks smooth and with as single minimum, in the second case we have some instabilities. If the calibration loss looks like this, you generally have to increase err.

We can use check to have an estimate of the bias and to have information regarding the convergence of the smoothing parameter estimation routine:

check(fit1)

## Theor. proportion of neg. resid.: 0.95   Actual proportion: 0.951
## Integrated absolute bias |F(mu) - F(mu0)| = 0.001017127
## Method: REML   Optimizer: outer newton
## full convergence after 5 iterations.
## Gradient range [4.449725e-08,4.449725e-08]
## (score 3438.722 & scale 1).
## Hessian positive definite, eigenvalue range [1.428662,1.428662].
## Model rank =  10 / 10 
## 
## Basis dimension (k) check: if edf is close too k' (maximum possible edf) 
## it might be worth increasing k. 
## 
##      k'  edf
## s(x)  9 5.05

The second plot suggest that the actual bias is much lower than the bound err = 0.05. This is also supported by the first two lines of text, which say that 95.1% of the residuals are negative, which is very close to the theoretical 95%. The text says that full convergence in smoothing parameter estimation has been achieved, it is important to check this.

In summary, practical experience suggests that:

Rainfall modelling in Parana state of Brasil

Here we are going to model weekly rainfall (mm/week) from over 600 weather station in the Parana state of Brazil. The data comes comes from the INLA R package. The meaning of most variables should be evident.

library(mgcViz)
load("data/parana.rda")

plot(parana[parana$TIME==1, ]$LO, parana[parana$TIME==1, ]$LA, xlab = "LO", ylab = "LA") 

We fit a median quantile GAM with an isotropic effect for longitude and latitude, a cyclic effect for the time of the year and smooth effects for distance from the sea and elevation:

fit <- qgamV(PREC ~ s(LO, LA, k = 25) + s(seaDist) + s(TIME, bs = "cc") + s(EL), 
             data = parana, 
             qu = 0.5)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5..............done
print(plot(fit), pages = 1)

Here using an isotropic smooth form LO and LA makes sense, because the two units are similar. We can still check whether a tensor effect would lead to a different effect:

fit2 <- qgamV(PREC ~ te(LO, LA, k = c(5, 5)) + s(seaDist) + s(TIME, bs = "cc") + s(EL), 
                       data = parana, 
                       qu = 0.5)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5..............done
library(gridExtra)
pl1 <- plot(sm(fit, 1)) + l_fitRaster() + l_fitContour() + theme(legend.position="bottom")
pl2 <- plot(sm(fit2, 1)) + l_fitRaster() + l_fitContour() + theme(legend.position="bottom")

grid.arrange(grobs = list(pl1$ggObj, pl2$ggObj), ncol = 2)

The two spatial effects are completely different! What happened is that the tensor product smooth got partially confounded with the distance from the sea effect. The te was more prone to doing this, because seaDist varies mostly along LO and it is mostly the marginal effect of LO that ended up offsetting the effect of seaDist. In general if your model include a bivariate effect s(x, y) or te(x, y), you have to be very careful when including an extra effect \(s(f(x, y))\) where some \(f\) is some fixed function (seaDist here).

We can of course fit this model to several quantiles:

fitM <- mqgamV(PREC ~ s(LO, LA, k = 25) + s(seaDist) + s(TIME, bs = "cc") + s(EL), 
               data = parana, 
               qu = seq(0.1, 0.9, length.out = 5), 
               err = 0.1)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5.............done 
## qu = 0.3.................done 
## qu = 0.7...................done 
## qu = 0.1.......................done 
## qu = 0.9.............done
plot(fitM, select = 1)

print(plot(fitM, select = 2:4), pages = 1)

Notice that the spatial effect seems much stronger for high quantiles than for the low one. The same is true for distance from the sea and seasonality (TIME), while the effect of elevation is not significant from qu = 0.9. We can visualize the spatial effect in 3D as follows:

plotRGL(sm(fitM[[5]], 1)) # This will not appear in the html file
## Warning in par3d(userMatrix = structure(c(1, 0, 0, 0, 0,
## 0.342020143325668, : font family "sans" not found, using "bitmap"

We might also wonder whether the spatial effect changes with time. To verify this here we construct a tensor product smooth between the 2D thin-plate-spline spatial effect and the cyclical effect of time. We simplify the model by removing the effect of seaDist.

fit4 <- qgamV(PREC ~ te(LO, LA, TIME, d = c(2, 1), k = c(20, 10), bs = c("tp", "cc")) + s(EL), 
              data = parana, 
              qu = 0.9, 
              err = 0.1)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.9..........done
plotSlice(sm(fit4, 1), 
          fix = list("TIME" = round(seq(1, 53, length.out = 6)))) + l_fitRaster()

You can plot any slice in 3D by doing:

plotRGL(sm(fit4, 1), fix = c("TIME" = 11)) # This will not appear in the html file

Dealing with heteroscedasticity

Let us simulate some data from an heteroscedastic model.

set.seed(651)
n <- 5000
x <- seq(-4, 3, length.out = n)
X <- cbind(1, x, x^2)
beta <- c(0, 1, 1)
sigma =  1.2 + sin(2*x)
f <- drop(X %*% beta)
dat <- f + rnorm(n, 0, sigma)
dataf <- data.frame(cbind(dat, x))
names(dataf) <- c("y", "x")

qus <- seq(0.05, 0.95, length.out = 10)
plot(x, dat, col = "grey", ylab = "y")
for(iq in qus){ lines(x, qnorm(iq, f, sigma)) }

We now fit ten quantiles between 0.05 and 0.95, using a quantile GAM with scalar learning rate.

fit <- mqgamV(y~s(x, k = 20, bs = "cr"),
              data = dataf,
              qu = qus)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.55........done 
## qu = 0.45..........done 
## qu = 0.35.......done 
## qu = 0.65..........done 
## qu = 0.25.........done 
## qu = 0.75..............done 
## qu = 0.15........done 
## qu = 0.85..........done 
## qu = 0.95...............done 
## qu = 0.05...........................done
qus <- seq(0.05, 0.95, length.out = 10)
plot(x, dat, col = "grey", ylab = "y")
for(ii in 1:length(qus)){
 lines(x, qnorm(qus[ii], f, sigma), col = 2)
 lines(x, predict(fit[[ii]]))
}
legend("top", c("truth", "fitted"), col = 2:1, lty = rep(1, 2))

The fitted quantiles are close to the true ones, but their credible intervals don’t vary much with x. Indeed, let’s look at intervals for quantile 0.95.

plot(x, dat, col = "grey", ylab = "y")
tmp <- predict(fit[[10]], se = TRUE)
lines(x, tmp$fit)
lines(x, tmp$fit + 3 * tmp$se.fit, col = 2)
lines(x, tmp$fit - 3 * tmp$se.fit, col = 2)

We can do better by letting the learning rate vary with the covariate. In particular, we can use an additive model for quantile location and one for the scale or learning rate.

fit <- qgamV(list(y~s(x, k = 20, bs = "cr"), ~ s(x, k = 20, bs = "cr")),
            data = dataf, qu = 0.95)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.95.............done
plot(x, dat, col = "grey", ylab = "y")
tmp <- predict(fit, se = TRUE)
lines(x, tmp$fit[ , 1])
lines(x, tmp$fit[ , 1] + 3 * tmp$se.fit[ , 1], col = 2)
lines(x, tmp$fit[ , 1] - 3 * tmp$se.fit[ , 1], col = 2)

References